Tabasco
ReSearch: Learning to Reason with Search for LLMs via Reinforcement Learning
Chen, Mingyang, Li, Tianpeng, Sun, Haoze, Zhou, Yijie, Zhu, Chenzheng, Wang, Haofen, Pan, Jeff Z., Zhang, Wen, Chen, Huajun, Yang, Fan, Zhou, Zenan, Chen, Weipeng
Large Language Models (LLMs) have shown remarkable capabilities in reasoning, exemplified by the success of OpenAI-o1 and DeepSeek-R1. However, integrating reasoning with external search processes remains challenging, especially for complex multi-hop questions requiring multiple retrieval steps. We propose ReSearch, a novel framework that trains LLMs to Reason with Search via reinforcement learning without using any supervised data on reasoning steps. Our approach treats search operations as integral components of the reasoning chain, where when and how to perform searches is guided by text-based thinking, and search results subsequently influence further reasoning. We train ReSearch on Qwen2.5-7B(-Instruct) and Qwen2.5-32B(-Instruct) models and conduct extensive experiments. Despite being trained on only one dataset, our models demonstrate strong generalizability across various benchmarks. Analysis reveals that ReSearch naturally elicits advanced reasoning capabilities such as reflection and self-correction during the reinforcement learning process.
Hallucination Diversity-Aware Active Learning for Text Summarization
Xia, Yu, Liu, Xu, Yu, Tong, Kim, Sungchul, Rossi, Ryan A., Rao, Anup, Mai, Tung, Li, Shuai
Large Language Models (LLMs) have shown propensity to generate hallucinated outputs, i.e., texts that are factually incorrect or unsupported. Existing methods for alleviating hallucinations typically require costly human annotations to identify and correct hallucinations in LLM outputs. Moreover, most of these methods focus on a specific type of hallucination, e.g., entity or token errors, which limits their effectiveness in addressing various types of hallucinations exhibited in LLM outputs. To our best knowledge, in this paper we propose the first active learning framework to alleviate LLM hallucinations, reducing costly human annotations of hallucination needed. By measuring fine-grained hallucinations from errors in semantic frame, discourse and content verifiability in text summarization, we propose HAllucination Diversity-Aware Sampling (HADAS) to select diverse hallucinations for annotations in active learning for LLM finetuning. Extensive experiments on three datasets and different backbone models demonstrate advantages of our method in effectively and efficiently mitigating LLM hallucinations.
- Asia > Singapore (0.05)
- Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.04)
- North America > Canada > Ontario > Toronto (0.04)
- (10 more...)
Towards Modeling Learner Performance with Large Language Models
Neshaei, Seyed Parsa, Davis, Richard Lee, Hazimeh, Adam, Lazarevski, Bojan, Dillenbourg, Pierre, Käser, Tanja
Recent work exploring the capabilities of pre-trained large language models (LLMs) has demonstrated their ability to act as general pattern machines by completing complex token sequences representing a wide array of tasks, including time-series prediction and robot control. This paper investigates whether the pattern recognition and sequence modeling capabilities of LLMs can be extended to the domain of knowledge tracing, a critical component in the development of intelligent tutoring systems (ITSs) that tailor educational experiences by predicting learner performance over time. In an empirical evaluation across multiple real-world datasets, we compare two approaches to using LLMs for this task, zero-shot prompting and model fine-tuning, with existing, non-LLM approaches to knowledge tracing. While LLM-based approaches do not achieve state-of-the-art performance, fine-tuned LLMs surpass the performance of naive baseline models and perform on par with standard Bayesian Knowledge Tracing approaches across multiple metrics. These findings suggest that the pattern recognition capabilities of LLMs can be used to model complex learning trajectories, opening a novel avenue for applying LLMs to educational contexts. The paper concludes with a discussion of the implications of these findings for future research, suggesting that further refinements and a deeper understanding of LLMs' predictive mechanisms could lead to enhanced performance in knowledge tracing tasks.
- North America > United States > New York > New York County > New York City (0.04)
- North America > United States > Tennessee > Shelby County > Memphis (0.04)
- North America > Mexico > Tabasco > Villahermosa (0.04)
- (3 more...)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
RJUA-QA: A Comprehensive QA Dataset for Urology
Lyu, Shiwei, Chi, Chenfei, Cai, Hongbo, Shi, Lei, Yang, Xiaoyan, Liu, Lei, Chen, Xiang, Zhao, Deng, Zhang, Zhiqiang, Lyu, Xianguo, Zhang, Ming, Li, Fangzhou, Ma, Xiaowei, Shen, Yue, Gu, Jinjie, Xue, Wei, Huang, Yiran
We introduce RJUA-QA, a novel medical dataset for question answering (QA) and reasoning with clinical evidence, contributing to bridge the gap between general large language models (LLMs) and medical-specific LLM applications. RJUA-QA is derived from realistic clinical scenarios and aims to facilitate LLMs in generating reliable diagnostic and advice. The dataset contains 2,132 curated Question-Context-Answer pairs, corresponding about 25,000 diagnostic records and clinical cases. The dataset covers 67 common urological disease categories, where the disease coverage exceeds 97.6\% of the population seeking medical services in urology. Each data instance in RJUA-QA comprises: (1) a question mirroring real patient to inquiry about clinical symptoms and medical conditions, (2) a context including comprehensive expert knowledge, serving as a reference for medical examination and diagnosis, (3) a doctor response offering the diagnostic conclusion and suggested examination guidance, (4) a diagnosed clinical disease as the recommended diagnostic outcome, and (5) clinical advice providing recommendations for medical examination. RJUA-QA is the first medical QA dataset for clinical reasoning over the patient inquiries, where expert-level knowledge and experience are required for yielding diagnostic conclusions and medical examination advice. A comprehensive evaluation is conducted to evaluate the performance of both medical-specific and general LLMs on the RJUA-QA dataset. Our data is are publicly available at \url{https://github.com/alipay/RJU_Ant_QA}.
- Health & Medicine > Therapeutic Area > Urology (1.00)
- Health & Medicine > Therapeutic Area > Nephrology (1.00)
INT2.1: Towards Fine-Tunable Quantized Large Language Models with Error Correction through Low-Rank Adaptation
Chai, Yuji, Gkountouras, John, Ko, Glenn G., Brooks, David, Wei, Gu-Yeon
We introduce a method that dramatically reduces fine-tuning VRAM requirements and rectifies quantization errors in quantized Large Language Models. First, we develop an extremely memory-efficient fine-tuning (EMEF) method for quantized models using Low-Rank Adaptation (LoRA), and drawing upon it, we construct an error-correcting algorithm designed to minimize errors induced by the quantization process. Our method reduces the memory requirements by up to 5.6 times, which enables fine-tuning a 7 billion parameter Large Language Model (LLM) on consumer laptops. At the same time, we propose a Low-Rank Error Correction (LREC) method that exploits the added LoRA layers to ameliorate the gap between the quantized model and its float point counterpart. Our error correction framework leads to a fully functional INT2 quantized LLM with the capacity to generate coherent English text. To the best of our knowledge, this is the first INT2 Large Language Model that has been able to reach such a performance. The overhead of our method is merely a 1.05 times increase in model size, which translates to an effective precision of INT2.1. Also, our method readily generalizes to other quantization standards, such as INT3, INT4, and INT8, restoring their lost performance, which marks a significant milestone in the field of model quantization. The strategies delineated in this paper hold promising implications for the future development and optimization of quantized models, marking a pivotal shift in the landscape of low-resource machine learning computations.
- Europe > France (0.94)
- North America > Canada > Saskatchewan (0.04)
- North America > Canada > Quebec (0.04)
- (15 more...)
Hundreds of ancient ceremonial sites are found hidden in Mexico
Hundreds of newly-discovered ancient ceremonial sites in Mexico reveal how the Mayans adopted a mysterious design trait from the older Olmec civilization more than 3,000 years ago, a study shows. Researchers have revealed that there are 478 ceremonial complexes that can't be seen with the human eye in modern-day southern Mexico, but can be detected with lidar scanning technology. The hundreds of ceremonial complexes are a combination of Maya and older Olmec sites, according to the study authors. Originating around 2600 BC, the Maya civilization thrived in Central America for nearly 3,000 years, reaching its height between AD 250 to 900. The Olmecs, meanwhile, were another Mesoamerican civilization who occupied the land earlier, from around 2,500 to 400 BC.
- North America > Central America (0.25)
- North America > Mexico > Tabasco (0.15)
- North America > United States > Arizona (0.06)
- (5 more...)
AI failings spark doubt over new tech era
Swagatam Sen starts off his day in a way that most people might find strange. "The first thing I do when I wake up in the morning is intentionally go to a website I have zero interest in," said the 39-year-old, who works at a financial institution in the U.K. Around the clock, whether at work or during his personal time, Sen switches back and forth between websites he likes and those he does not, all to trick the artificial intelligence algorithms that track his online activity. Social media platforms and other online services use AI to follow each user's day-to-day internet searches and browsing habits and determine what ads, search results or posts would be most appropriate for them. Worried that his knowledge would skew toward certain favored fields, Sen about a year ago began his quest to baffle the algorithms. Tech companies have tried to harness AI to leverage troves of data, but the digital pollution they generate is making people's real lives worse.
- Asia (0.65)
- Europe > United Kingdom (0.50)
- North America > Mexico > Tabasco (0.25)
- Information Technology (0.69)
- Government > Regional Government (0.30)
- Education > Educational Setting > K-12 Education (0.30)
27 Maya ritual sites discovered on online map by eagle-eyed archaeologist
Researchers have uncovered a 1,500-year-old stucco mask of Maya ruler K'inich Janaab'Pakal. What differentiates this mask from others is it's seemingly made in the king's likeness. An eagle-eyed archaeologist has used a freely available online map to locate 27 Maya ceremonial sites in Mexico. Takeshi Inomata, a professor of archaeology at the University of Arizona, made the discovery using a LiDAR (Light Detection and Ranging) map he found online last year, according to the New York Times. LiDAR technology harnesses a laser to measure distances to the Earth's surface and can prove extremely valuable to study what is hidden in areas with thick vegetation.
- North America > United States > Arizona (0.26)
- North America > Guatemala (0.07)
- North America > Mexico > Tabasco (0.06)
- (3 more...)
Pompeo accuses Iran of 'unprecedented attack' after drones hit Saudi oil facilities
The attack comes after Iran exceeded their enriched uranium stockpile limit in the nuclear deal. Secretary of State Mike Pompeo called on the international community to join him Saturday in condemning Iran for drone attacks on two Saudi oil facilities, which he described as "an unprecedented attack on the world's energy supply." "Tehran is behind nearly 100 attacks on Saudi Arabia while [President Hassan] Rouhani and [Foreign Minister Mohammad] Zarif pretend to engage in diplomacy," Pompeo tweeted, referring to the nation's president and foreign affairs minister. There is no evidence the attacks came from Yemen." Iran-backed Houthi rebels in Yemen claimed responsibility for the attack hours before Pompeo's tweet. The world's largest oil processing facility in Saudi Arabia and a major oil field were impacted, sparking huge fires at a vulnerable chokepoint for global energy supplies. "The United States will work with our partners and allies to ensure that energy markets remain well supplied and Iran is held accountable for its aggression," Pompeo concluded. According to multiple news reports that cited unidentified sources, the drone attacks affected up to half of the supplies from the world's largest exporter of oil, though the output should be restored within days. It remained unclear if anyone was injured at the Abqaiq oil processing facility and the Khurais oil field. Sen. Chris Murphy, D-Conn., who sits on the Senate Foreign Relations Committee, denounced Pompeo's description of the attack, calling it an "irresponsible simplification." "The Saudis and Houthis are at war.
- North America > United States (1.00)
- Asia > Middle East > Yemen (0.76)
- North America > Mexico > Tabasco (0.25)
- (3 more...)
- Government > Foreign Policy (1.00)
- Energy > Oil & Gas > Upstream (1.00)
- Government > Regional Government > North America Government > United States Government (0.74)
- Government > Regional Government > Asia Government > Middle East Government > Iran Government (0.56)
What Are the Invariant Occlusive Components of Image Patches? A Probabilistic Generative Approach
Dai, Zhenwen, Exarchakis, Georgios, Lücke, Jörg
We study optimal image encoding based on a generative approach with non-linear feature combinations and explicit position encoding. By far most approaches to unsupervised learning learning of visual features, such as sparse coding or ICA, account for translations by representing the same features at different positions. Some earlier models used a separate encoding of features and their positions to facilitate invariant data encoding and recognition. All probabilistic generative models with explicit position encoding have so far assumed a linear superposition of components to encode image patches. Here, we for the first time apply a model with non-linear feature superposition and explicit position encoding. By avoiding linear superpositions, the studied model represents a closer match to component occlusions which are ubiquitous in natural images. In order to account for occlusions, the non-linear model encodes patches qualitatively very different from linear models by using component representations separated into mask and feature parameters. We first investigated encodings learned by the model using artificial data with mutually occluding components. We find that the model extracts the components, and that it can correctly identify the occlusive components with the hidden variables of the model. On natural image patches, the model learns component masks and features for typical image components. By using reverse correlation, we estimate the receptive fields associated with the model's hidden units. We find many Gabor-like or globular receptive fields as well as fields sensitive to more complex structures. Our results show that probabilistic models that capture occlusions and invariances can be trained efficiently on image patches, and that the resulting encoding represents an alternative model for the neural encoding of images in the primary visual cortex.
- North America > United States > Texas > Kleberg County (0.04)
- North America > United States > Texas > Chambers County (0.04)
- North America > United States > California > Santa Clara County > Stanford (0.04)
- (6 more...)
- Information Technology > Sensing and Signal Processing > Image Processing (1.00)
- Information Technology > Artificial Intelligence > Vision (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.67)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models (0.46)